Maximum Entropy in Nilsson's Probabilistic Logic

نویسنده

  • Thomas B. Kane
چکیده

Nilsson's Probabilistic Logic is a set theoretic mechanism for reasoning with uncertainty. We propose a new way of looking at the probability constraints enforced by the framework, which allows the expert to include conditional probabilities in the semantic tree, thus making Probabilistic Logic more expressive. An algorithm is presented which will find the maximum entropy point probability for a rule of entailment without resorting to solution by iterative approximation. The algorithm works for both the propositional and the predicate logic. Also presented are a number of methods for employing the conditional probabilities. 1 . In t roduc t ion A recent trend in reasoning with uncertainty has been to move away from representing the uncertainty of a sentence with a point probability, towards more complex mechanisms. Most notably with Probabilistic Logic [Nilsson 1986a, Grosofl986a, Guggcnheimer 1987a], Incidence Calculus [Bundy 1986a, Bundyl986b], and stochastic simulation [Pearl 1987a]. All of these systems involve explicit knowledge of possible world scenarios. In Incidence Calculus, the probability of a sentence is based on a sample space of points. Each of these points can be regarded as a possible world. In Pearl's stochastic simulation probabilities of events are computed by recording the fraction of time that events occur in a random series of scenarios generated from some causal model. Probabilistic Logic is a generalisation of the ordinary true-false semantics for logical sentences to a semantics that allows sentences to be uncertain, and consequently to have more than one possible state. The consequences of their set-theoretic nature leaves these systems prey to complexity problems in space and time. Bundy's Legal assignment finder, which finds all the legal specialisations of an initial probability assignment is at least exponential. The number of runs it takes to approximate correct probability values in the stochastic simulator is of the same order, as is entailment inside probabilistic logic. Nilsson reports that implementation of the full procedure for probabilistic entailment would usually be impracticable. The maximum entropy [Levine 1979a, Bard 1982a] principle also needs knowledge of all the possible states of uncertain information, and in this respect it is related to the possible world listed above, and shares the same complexity problems. These methods form part of what appears to be a formidable family of conceptually compelling theories of reasoning with uncertainty which suffer from the same problem: intractibility. This paper addresses this problem for Nilsson's probabilistic logic, and discusses it's use of the maximum entropy method. It is the coupling of this method to the semantic framework of probabilistic logic which is at the core of this paper. The system produced is very fast, and allows the expert to use conditional probabilities in designing the statistical distribution. 2. Entropy Entropy [Harris 1982a], is a statistical term which has evolved from a study of thermodynamics [1977a]. It is related to the probability of a thermodynamic system being in a given state as related to the number of different molecular configurations that the system can assume in that state. Since in general a system changes spontaneously toward a more probable stale, the entropy increases accordingly. Equilibrium, or maximum entropy, is the state in which the molecules can occupy the greatest number of configurations. More formally, the entropy of the probability mass function px(x) may be regarded as a descriptive quantity, just as the median, mode, variance and coefficient of skewness may be regarded as descriptive parameters. The entropy of a distribution is a measure of the extent to which the probability is concentrated on a few points or dispersed over many. It is an expression of the degree of disorder of a system. In the examples here we will use 2 as the logbase; although any base can actually be used [Harrisl982a]. Example We are provided with four coins and told that one of the coins is counterfeit Below there are four probability distributions given, where the coins are labelled 1 to 4, and pn represents the probability that coin n is the counterfeit coin. The distributions are labelled Dl to D4, and the entropy is labelled H. 452 Automated Deduction In this case, the distribution with maximum entropy is D l . The reduction in entropy from Dl to D4 demonstrates the effect of having more information about the change in probabilistic likelihood of one of the coins over the others. D4 represents the case where there is no uncertainty as to which coin is counterfeit. Information is embodied in each of the distributions, and we can see that the distribution which says least about the identity of the counterfeit coin is D l . This equation of information with entropy leads to the maximum entropy principle: Of all probability distributions which satisfy the constraints imposed by the known aggregate probabilities, choose the one which has the maximum entropy or, equivalently, contains the least information. 3. Probabi l ist ic Enta i lment and Context Nilsson defines probabilistic entailment as an analogue of logical entailment in classical logic. The rule of modus ponens allows us to use the set (A1 A1,=>B) to deduce (B). When we have uncertainty about whether or not A1 or A1=>B is true, the real world, which has the real value of B, becomes a random variable, and can be one of a number of possible states. These states (possible worlds) can be produced mechanically by an exhaustive theorem prover [Changl973a], and the collected group represented called a semantic tree. In conventional set theoretic terms, this set of all possibilities is the universal set. In statistical terms, this set is called the sample space or possibility space. As an example, Nilsson uses the set (A1, A1=>B) to estimate the probability of logically entailed sentence B. A complete interpretation table for the worlds which form the base set for the inference is: The possible worlds are labelled with small letters a, b and c. Each possible world must eventually be assigned a non-zero probability such that if the probability of a sentence S is S, and S is true in worlds a and b, then p(a) + p(b) = S. The tautology T is true in all possible worlds and is included in the set to ensure that all the probabilities sum to 1. Structurally, world c in this example is the world which causes concern. Nilsson presented the states for the semantic tree as: The reasoning behind this being that in worlds a and b, B can only assume one logical value, 1 and 0 respectively. But in world c, B can logically assume either of the values 1 or 0. Hence, in figure 3, c represents the world where B is false, and d represents the world where B is true. However, in the semantic tree for figure 3 there is no way of distinguishing between the worlds c and d, because they are the same world: i.e. where A 1 is false and the rule A 1=>B is true. Figure 3 also imposes an unnecessary condition on the relationship between the possible worlds (-A1,A1=>B, B) and (-iA1A1=>B, -.B), namely that they have the same probability. In this sense, figure 3 incorporates information into our reasoning process which is not necessarily true. In appendix B we show that 2"+l possible worlds are created from the tree in figure 2, where n is the number of propositions in the antecedent list of the rule. Effectively, we are left with n equations and 2 possible worlds to solve for. One way to remove the additional degrees of freedom is to maximise the entropy of the system. Bard [Bardl982a ,Bardl980a] presents examples which employs the notion of a semantic tree, and which illustrates the following solution methods with clear examples. Each possible world is rewritten in terms of a multiplication of factors [Bardl982a ,Cheeseman 1983a ,Nilsson 1986a], where an unknown factor is associated with each of the sentences in the database. We shall use the following notation, with a1 representing the factor for the tautology; and the factors aj being associated with proposition j; and factor aR being associated with the rule. We include the factor in the multiplication list for a possible world only if the world has a one in the corresponding row of the semantic tree. So, in figure 2, we have:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nilsson's Probabilistic Entailment Extended to Dempster-Shafer Theory

Probabilistfc logic has been discussed in a recent paper by N. Nilsson [12]. An ·entailment scheme is proposed which can predict the probability of an event when the probabilities of certain other connected events are known. This scheme involves the use of a maximum entropy method proposed by P. Cheeseman in [3]. The model uses vectors which represent certain possible states of the world. Only ...

متن کامل

Probabilistic Logic Programming under Maximum Entropy Justus-liebig- Universit at Gieeen Ifig Research Report Probabilistic Logic Programming under Maximum Entropy

In this paper, we focus on the combination of probabilistic logic programming with the principle of maximum entropy. We start by deening probabilistic queries to probabilistic logic programs and their answer substitutions under maximum entropy. We then present an eecient linear programming characterization for the problem of deciding whether a probabilistic logic program is satissable. Finally,...

متن کامل

Probalilistic Logic Programming under Maximum Entropy

In this paper, we focus on the combination of probabilistic logic programming with the principle of maximum entropy. We start by deening probabilistic queries to probabilistic logic programs and their answer substitutions under maximum entropy. We then present an ef-cient linear programming characterization for the problem of deciding whether a probabilistic logic program is satissable. Finally...

متن کامل

Convergent Deduction for Probabilistic Logic

This paper discusses the semantics and proof theory of Nilsson's proba­ bilistic logic, outlining both the benefits of its well-defined model theory and the drawbacks of its proof theory. Within Nilsson's semantic frame­ work, we derive a set of inference rules which are provably sound. The resulting proof system, in contrast to Nilsson's approach, has the impor­ tant feature of convergence tha...

متن کامل

Generation of Parametrically Uniform Knowledge Bases in a Relational Probabilistic Logic with Maximum Entropy Semantics

In a relational setting, the maximum entropy model of a set of probabilistic conditionals can be defined referring to the full set of ground instances of the conditionals. The logic FO-PCL uses the notion of parametric uniformity to ensure that the full grounding of the conditionals can be avoided, thereby greatly simplifying the maximum entropy model computation. In this paper, we describe a s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989